1 |
MAGIC DUST FOR CROSS-LINGUAL ADAPTATION OF MONOLINGUAL WAV2VEC-2.0
|
|
|
|
In: ICASSP 2022 ; https://hal.archives-ouvertes.fr/hal-03544515 ; ICASSP 2022, May 2022, Singapour, Singapore (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Learning Audio-Video Language Representations
|
|
|
|
Abstract:
Automatic speech recognition has seen recent advancements powered by machine learning, but it is still only available for a small fraction of the more than 7,000 languages spoken worldwide due to the reliance on manually annotated speech data. Unlabeled multi-modal data, such as videos, are now increasingly available in many different languages and provide opportunities to scale speech technologies. In this thesis, we introduce models and datasets for learning visually grounded spoken language from raw audio in videos. We propose a self-supervised audio-video model that learns from the English narration naturally present in instructional videos to relate spoken words and sounds to visual content. Our model can recognize spoken words and natural sounds in audio queries to retrieve relevant visual clips, supporting its application to video search directly using audio and spoken queries, without needing to transcribe speech to text. We further demonstrate that our model can learn multilingual audiovideo representations and can successfully perform retrieval on Japanese videos. Since our approach only requires audio-visual data without transcripts, we believe it is a promising direction to enable novel speech processing tools. ; M.Eng.
|
|
URL: https://hdl.handle.net/1721.1/139024
|
|
BASE
|
|
Hide details
|
|
5 |
Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0 ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation? ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Mitigating Biases in Toxic Language Detection through Invariant Rationalization ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Mitigating Biases in Toxic Language Detection through Invariant Rationalization ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
|
|
|
|
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02912029 ; Interspeech 2020, Oct 2020, Shanghai, China (2020)
|
|
BASE
|
|
Show details
|
|
11 |
Similarity Analysis of Contextual Word Representation Models ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Improved Speech Representations with Multi-Target Autoregressive Predictive Coding ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Classifying Alzheimer's Disease Using Audio and Text-Based Representations of Speech
|
|
|
|
In: Frontiers (2020)
|
|
BASE
|
|
Show details
|
|
19 |
Identification of digital voice biomarkers for cognitive health
|
|
|
|
In: Explor Med (2020)
|
|
BASE
|
|
Show details
|
|
20 |
On the Linguistic Representational Power of Neural Machine Translation Models
|
|
|
|
In: Computational Linguistics, Vol 46, Iss 1, Pp 1-52 (2020) (2020)
|
|
BASE
|
|
Show details
|
|
|
|